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Abstract  -  We  consider  here  a  first  step  in  applying 
CRANOF  (Complexity  Reducing  Algorithm  for  Near 
Optimal  Fusion  -  a  new  rigorously-based  complexity- 
reducing  algorithm  that  produces  estimates  of 
underconstrained  probabilities  -  to  the  problem  of 
determining  when  a  computer  network  is  under  attach 
Essentially,  CRANOF  treats  this  issue  as  a  formal 
analogue  of  the  pairwise  track  correlation  or  similarity 
problem,  comparing  the  current  cyber-state  history  with 
each  of  various  alternative  classes  of  cyber- state  histories 
relative  to  various  features  or  attributes  measuring 
various  degrees  of  normality  /  abnormality. 

Keywords  -  Information  assurance,  cyber-states,  intrusion, 
second  order  probability,  transitivity. 

1.  Introduction 

For  a  fully  netted  force  in  the  years  2010  and  beyond, 
information  systems  will  constitute  a  critical  “center  of 
gravity”  and  must  be  designed  to  be  survivable. 
Fortunately,  future  Network-Centric  Warfere  (NCW) 
concepts  will  depend  on  widely  dispersed  network  nodes 
that  make  a  “hard-to-find”  center  of  gravity.  This  naturally 
survivable  and  gracefully  degradable  architecture  will  still 
need  an  active  and  effective  resident  Information  Assurance 
(lA)  capability.  NCW  networics  and  related  systems  must 
be  robust  and  able  to  absorb  faults  and  intrusions  without 
significant  reductions  in  c^ability.  While  it  cannot  be 
assumed  that  lA  will  make  NCW  imassailable  in  201 0,  lA 
will  insure  that  NCW  systems  arc  able  to  deliver  the 
capabilities  required  by  Naval  Power  Forward  [18]. 
Activities  such  as  Information  Assurance,  computer 
network  defense  and  counter-deception  will  defend 
decision-making  processes  by  neutralizing  an  adversary’s 
perception  management  and  intelligence  collection.  Two 
important  technologies  in  Information  Assurance  are 
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Strategic  Intrusion  Assessment  (SLA)  and  Cyber  Command 
and  Control  (CC2). 

The  Common  Intrusion  Detection  Framework  (CIDF) 
working  group  has  stated  [8]  that  two  key  problems  in  SIA 
arc  (a)  the  fiising  and  correlating  of  event  and  sensor 
information  and  (b)  the  tracking  of  attacks.  Furthermore, 
in  the  CC2  program,  a  key  problem  is  developing 
situation  awareness.  These  three  problems  are  all,  of 
course,  problems  in  higher  levels  of  fusion.  It  is  also  our 
thesis  that  a  close  analogy  exists  between  the  problem  of 
track-to-track  correlation  of  kinematic  targets  and  the 
problem  of  situation  awareness  via  fusion  of  attack 
information  about  computer  networks.  In  this  analogy,  the 
concept  of  alternative  track  histories  corresponds  to  the 
concept  of  alternative  cyber-state  histories.  By  “cyber-state 
history”,  we  mean  either  a  description  of  past  attacks  of 
various  kinds,  or  the  temporal  patterns  expected  to  be 
observed  in  various  types  of  unprecedented  attacks,  or  even 
non-attack  disruptions.  Consider  then  the  general  problem 
of  cyber-attack  classification  and  fusion  comparing  the 
current  cyber-state  history  with  each  of  various  alternative 
classes  of  cyber-state  histories.  In  framing  this  problem, 
one  can  employ  three  basic  types  of  random  vectors,  with 
corresponding  conditional  probabilities  representing 
associated  uncertainties:  (a)  correlation/similarity  levels, 
(b)  cyber-attribute  (or  cyber-feature)  matching  (or  non¬ 
matching)  of  attributes  and  (c)  observed  cyber-attribute  data 
from  each  cyber-state  history.  The  correlation  level 
variable  represents  the  degree  of  similarity  between  the 
current  c>bcr-statc  history  and  alternative  cyber-state 
histories  representing  various  attack  and  non-attack 
disruptions.  The  goal  is  to  estimate  the  conditional 
prob^ility  of  various  correlation  levels,  given  the 
observed  data,  xmder  circumstances  where  the  joint 
probability  measure  of  the  basic  random  variables  is  only 
partially  obtainable. 
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2.  Underconstrained  Probability 
Problem 

2.1  Cyber  intrusion  detection  as  a  special 
case  of  the  matching  problem 

Strategic  intrusion  assessment  for  cyber  command  and 
control  obviously  depends  upon  judgment  (human, 
mechanistic,  or  some  combination  of  both)  that  the  degree 
of  similarity  between  the  current  cyber  state  of  affairs  and 
any  of  a  prescribed  set  of  states  during  past  intrusions  -  or 
even  perceived  future  intrusions  -  is  sufficiently  high  as  to 
warrant  corresponding  defensive  action.  As  stated  in  the 
Introduction,  the  above  problem  is  analogous  to  that  of 
track  correlation,  where  a  number  of  geolocation  and  non¬ 
geolocation  attribute  estimates  in  enx)r  are  considered  for 
matching  in  order  to  determine  whether  or  not  die  two 
tracks  represent  the  same  target  of  interest  or  not  (For  an 
example  of  an  approach  prior  to  the  use  of  CRANOF,  see, 
e.g.,  [19].) 

Analogies  can  be  established  with  other  various  pattern 
matching  problems,  including  fingerprint  identification, 
photography  matching,  use  of  clues  left  at  crime  scenes, 
and  everyday  recall  of  situations  sufficiently  similar  to  past 
events.  Furthermore,  this  large  class  of  matching 
problems  is  yet  a  special  case  of  an  even  larger  class  of 
problems:  the  underconstrained  probability  class.  To  see 
this,  consider  the  following  problem  expresses  in  terms  of 
conditional  statements  that  need  only  be  partially  true. 

Given: 

“If  attribute  j  for  X  and  Y  matches,  for  j  =  1 m,  then  X 
and  Y  are  the  same”; 

“If  observe  (in  possible  error)  Xj  for  X  and  Yj  for  Y  (wit 
attribute  j),  then  X  and  Y  match”,  j  =  1,...,  m; 

Determine: 

“If  observe  Xj  for  X  and  Yj  for  Y,  j  =  1 ,...,  m,  then  X  snd 
Y  are  the  same”  (1) 

In  terms  of  corresponding  conditional  probability 

evaluations  and  assuming,  for  simplicity,  mutual 
independence  of  each  of  the  conditionals  in  the  second  set 
of  expressions: 

P(a|b)  =  s  ,  P(b|c)=t,  (2) 

where,  using  standard  boolean  algebra  terminology, 
a  =  “X  and  Y  are  same  type  of  cyber  attack  /  disruption”, 
b  =  bi&,..&bn, , 

bj  =  “X  and  Y  match  wrt  attribute  j”,  j  =  1,...,  m, 
c  =  ci&...&Cn„  Cj=  (Xj,  Yj),  j  =  1,...,  m,  (3) 

and 

P(b|c)  -  P(b,|c,)  “P(b„.|c„.).  (4) 

Thus,  in  summary,  the  matching  problem  -  in  quite 
simplified  form  -  can  be  phrased  as 

Given  P(a|b)  =  s,  P(b|c)  =  t;  Determine  P(a|c),  (5) 


where  s  and  t  are  known  either  exactly  or  approximately 
and  where  a,  bj,  b,  Cj,  c  are  all  as  in  eq,(3). 

While  commonsense  reasoning  indicates  that  in 
general,  if  s  and  t  are  reasonably  high  (i.e,,  close  to  unity), 
then  so  should  P(a|c),  it  does  not  actually  follow  that  a 
particular  P  will  satisfy  this  property.  This  transitivity 
problem,  which  is  a  probabilistic  version  of  the  famous 
Aristotle  syllogism  -  where,  in  terms  of  probability 
formulation,  if  s  =  t  =  1,  the  conclusion  P(a|c)  =  1  indeed 
holds  -  has  provoked  great  controversy  in  the  AI 
community.  In  fact,  one  can  readily  construct  probability 
measures  P  such  that  s,  t  can  be  arbitrarily  close  to  unity 
(but  not  exactly  unity),  with  P(a|c)  close  to,  or  even  equal 
to  zero,  (For  backgroimd  on  this  issue,  see,  e.g..  Pearl 
[17].)  Thus,  “determine  P(a|c)”  in  eq,(4)  should  be 
replaced  by  “estimate  P(a|c)  in  some  best  sense”: 

Given  P(a|b)  =  s,  P(b|c)  =  t; 

Estimate  P(a|c)  in  some  best  sense,  (6) 

If  the  above  independence  assumption  leading  to  eq.(4) 
is  felt  to  be  imwarranted,  but  at  least  the  marginal 
conditional  probabilities  P(bj|Cj)  are  known  or  estimable, 
then  the  matching  problem  can  be  modified  -  and  relaxed, 
appropriately  -  so  that  eq,(6)  is  replaced  by 

Given  P(a|b)  =  s,  P(bj|cj)  =  tj,  j  =  1„„,  m; 
Estimate  P(a|c)  in  some  best  sense,  (7) 

assuming  s,  tj  are  obtainable,  j  =  1,...,  m. 

More  generally,  the  underconstrained  probability 
problem  can  be  phrased  as 

Given:  P(ai|bi)  =  ti,.,.,  P(an,|bni)  =  to,; 

Estimate  in  some  best  sense  P(e|f),  (8) 

Eq.(8)  also  arises  in  problems  involving  rule-based 
systems  -  where  the  given  (not  necessarily  perfect)  rules 
are  of  the  form  “if  bj,  then  a^”,  the  given  (possibly  partially 
true)  facts  are  fi,..,,  fn,  and  it  is  desired  to  ascertain  whether 
e  is  also  a  fact  In  this  case,  one  simply  interprets  f  as  the 
conjunction  fi&...&fD. 

Many  other  patterned  special  cases  can  also  be 
considered  either  as  extensions  of  classical  logic  valid  or 
invalid  cntailmcnt  schemes  or  arising  fi-om  AI 
considerations.  (See,  e.g.,  [14]  for  more  details.)  On  the 
other  hand,  underconstrained  probability  problems  need 
not  initially  be  connected  with  any  logical  cntailmcnt 
considerations,  but  simply  arise  from  probabilistic  models 
describing  a  multitude  of  military  problems,  including 
surveillance,  search,  detection,  prediction,  and  reliability, 
among  many  others.  All  that  is  required  is  that  the 
conditional  probability  of  interest  P(e|f)  not  be  uniquely  - 
or  inconsistently  -  specified  by  the  known  probabilities  of 
the  given  or  premise  collection  of  conditional 
probabilities.  Also,  note  that  a  basic  alternative  form  of 
the  exact  threshold  formulation  of  the  estimation  problem 
presented  in  eq.(8)  is  the  corresponding  lower  bound 
threshold  formulation: 

Given:  P(ai|bi)  i  ti,...,  P(a„,|b„,)  i  U; 

Estimate  in  some  best  sense  P(e|f).  (9) 
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Again,  as  in  the  transitivity  and  modified  transitivity 
problem,  the  events  aj,  bj,  e,  f  may  all  be  related  in  some 
patterned  way,  or  pcihaps  in  no  particular  way. 

Finally,  a  basic  related  issue  to  that  in  eq.(8)  is  to 
estimate  P(e|f)  in  some  sense  when  the  thresholds  arc  made 
to  approach  unity  in  some  sense  (such  as  tmiformly  or  at 
various  prescribed  rates) 

2.2.  Estimation  aspect  of  CRANOF 

One  approach  to  estimating  the  desired  conclusion 
probability  P(clf)  in  eq.(9)  -  as  originated  with  Adams 
(and  usually  provided  in  more  general  conditional  form  to 
be  discussed  below)  [1,  2]  -  is  to  take  a  pessimistic 

viewpoint  in  selecting  some  set  of  fixed  lower  bound 
probability  thresholds  tj  corresponding  to  each  premise 
event  aj,  j  =  1,...,  n,  and  determine  as  a  function  of  1  = 
(ti,...,  tn),  the  minimum  conclusion  function  to  at  least 
degree  i, 

minconc({if  bi,  then  ai . if  bm,  then  a„,};  if  f,  then  e)(D 

=  inf  {P(e|f):  P  is  a  probability  measure  over  B  and 

P(aj|bj)  i  tj,  j  =  1 . m}.  (10) 

While,  at  first  glance,  the  use  of  the  minimum  conclusion 
fimetion  appears  reasonable,  it  can  be  shown  that),  for  all 
thresholds  1  sufficiently  close  to,  but  distinct  firom,  unity: 
(a)  A  number  of  key  reasoning  schemes  fitting  the  general 
format  of  eq.(9)  with  “best  estimate”  interpreted  via  eq.(lO) 
lead  to  the  trivial  value  of  0.  This  includes  transitivity 
[16,  3,  4].  (b)  For  any  reasoning  scheme,  the  limiting 
value  of  the  estimate  of  P(e|f)  is  either  0  or  1,  thus  not 
allowing  for  nontrivial  “degrees  of  validity  or  confidence”. 
(Again,  sec  [1,  2,  4,  5]  for  details.) 

A  less  pessimistic  approach  -  that  of  CRANOF 
-addresses  the  general  underconstrained  probability 
problem  as  explicated  in  Section  2.1,  taking  into  account 
both  optimality  of  solution  and  implementation 
complexity.  Here,  one  replaces  the  minimum  conclusion 
function  above  by  the  mean  conclusion  function  [4,  5], 
where  for  some  choice  of  prior  distribution  D  over  possible 
P’s  of  interest  (such  as  corresponding  to  a  uniform 
distribution  or  various  biased  distributions),  indicating,  as 
usual,  conditional  expectation  with  respect  to  choice  D  of 
P’s  as  Ed(.|..), 

the  lower  Iwund  threshold  formulation  is 

meanconci({if  bi,  then  ai,...,  if  bm,  then  am}; 
if  f,  then  e)(l) 

=  ED(P(clf)  I  P  is  a  probability  measure  over  B  and 

P(aj|bj)  tj,  j  =  1,...,  m),  (11a) 

with  the  obvious  analogue  holding  for  the  exact  threshold 
formulation 

meanconC2({if  bi,  then  ai,...,  if  bm,  then  am}; 
if  f,  then  c)(l) 

=  ED(P(c|f)  I  P  is  a  probability  measure  over  B  and 

P(aj|bj) -tj,  j  =  1,...,  m).  (lib) 


The  conditional  expcctation(s)  in  eq.(ll)  must  be 
further  explained  to  make  sense.  First,  consider  the 
boolean  algebra  B  of  all  events  naturally  generated  from  the 
key  components  making  up  the  conditional  expressions  “if 
bj,  then  a^”,  j  =  1,...,  m,  as  well  as  “if  f,  then  e”,  or 
equivalently,  the  conditional  probabilities  P(^ibj),  j  =1,..., 
m,  and  P(elf):  i.e.,  from  the  class  of  events 

C  =  {a,&bi,  bj:  j=  1 . m}  U  {e&f,  f).  (12) 

Define,  for  ease  of  notation  here, 

ao  =  e,  bo  =  f,  (13) 

and  for  each  j,  define 

cojj  =  a,&bj;  0)^,2  =  a^'&bj ,  =  bj'.  (14) 

In  turn,  consider  the  class  G  of  functions  g,  where 
G=  {g:  g  maps  {0,l,...,m}  into  {1,  2,  3}}  =  {1,  2,3}<'’'''-”>(15) 

and  for  each  g  in  G,  define  the  relative  atom  (Og  determined 
by  g  acting  on  C  as 

”  t*J0.g(0)&COl.g<i)&...&a)m,g(m).  (16) 


Next,  define  die  class  of  all  such  nonvacuous  relative 
atoms  as 


^  ^  {^1*  g  in  G  and  (Og  0}.  (17) 

Then,  it  is  easily  verified  that  B  consists  of  all  finite 
disjoint  disjunctions  (v)  of  cOg  in  A.  Hence,  any 
probability  measure  P  that  is  well  defined  with  respect  to 
the  conditional  expressions  making  up  the  problem  in 
eqs.(8)  or  (9)  -  or  equivalently,  well-defined  upon  the  key 
events  making  up  C  -  is  uniquely  determined  by  its  values 
over  A.  In  fact,  P  can  be  naturally  identified  as  a 
probability  function  over  A,  or  equivalently,  as  the  q  by  1 
probability  vector  (indicating  matrix  or  vector  transpose  by 

tf) 

E  =  (P(wj))j  m  A  ;  q  =  cardmality(A).  (1 8) 


In  turn,  totally  ordering  all  q  relative  atoms  and 
identifying  them  as  element  1,...,  element  q,  any  prior 
second  order  probability  distribution  D  of  probability 
measures  P  for  the  problem  addressed  in  eqs.(8)  or  (9)  can 
be  naturally  identified  as  an  ordinary  probability 
distribution  over  the  space  of  possible  Z.’s  in  eq.(18),  i.e., 
over  the  q-simplex  in  surface  form 

Sq=  {E:E^=(pi,....pq):0a:pj  s  1,  pi +...+ Pq  =  1},  (19) 

or  equivalently  over  the  full  (q-l)-dimensional  simplex 
S(q)=  (E:  E  -  (pi,....pq-!):  0  S  Pj  s  l,pi  +...+  Pq.1  s  1},  (20) 
One  natural  choice  of  prior  D  over  S(q)  is  the  uniform 
one.  More  generally,  it  can  be  shown  that  in  a  natural 
sense,  the  optimal  choice  of  family  of  possible  priors  D  is 
that  of  the  Dirichlet  form  [12].  In  any  case,  the  above 
argument  demonstrates  explicitly  how  the  conditional 
expectation  in  eq.(9)  makes  sense.  It  also  follows  diat  die 
optimal  estimate  of  P(e|f),  which  is  obviously  the 
expectation  of  the  bayesian  posterior,  in  general  will  be 
uniquely  achieved,  and  from  classical  results,  due 
originally  to  Wald  [20],  will  diercfore  be  a  decision- 
theoretic  admissible  estimator  relative  to  the  usual  norm- 
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square  loss  function.  The  robustness  properties  of  such 
posterior  estimators  are  also  well-known  [13, 16], 

Another  ^roach  to  the  interpretation  of  “estimate  in 
some  best  sense”  in  eqs.(8)  or  (9)  is  the  use  of  maximal 
entropy  [9],  Alternatively,  the  term  “estimate  in  some  best 
sense”  in  these  equations  can  be  avoided  altogether  by 
taking  an  upper/lower  probability  bound  ^proach  [7,  21], 
(Ongoing  research  is  being  conducted  by  the  audiors 
investigating  the  relationship  between  the  bayesian 
posterior  ^proach  of  the  mean  conclusion  function  with 
these  other  approaches.) 

Returning  to  cqs.(8),  (9),  one  can  first  show  the  basic 
relation  between  the  exact  and  lower  bound  formulations  is 
that  the  latter  is  a  weighted  integral  of  the  former  and  that 
the  former  can  be  expressed  succinctly  as  the  ratio  of  two 
well-defined  surface  integrals,  that,  in  turn,  can  be 
evaluated  as  ordinary  integrals  [10],  improving  and 
extending  earlier  results. 

Often,  it  is  more  convenient  to  consider  in  place  of 
either  meanconcj  function  in  eq.(ll),  the  natural 
corresponding  plug-in  forms 

meanconc3({if  bi,  then  aj,...,  if  bm,  then  3^}; 

if  f,  then  e)(0  =  P^sCejf)  ;  (21a) 

P^3  =  Ed(P  I  P  is  a  probability  vector  in  S(q)  and 

P(ajlbj)  ^  tj,  j  =  1,...,  m);  (22a) 

meanconc4({if  bi,  then  ai,...,  if  bm,  then  3^}; 

if  f,  then  e)(l)  =  P^4(e|f)  ;  (21b) 

P*^4  =  Ed(P  I  P  is  a  probability  vector  in  S(q)  and 

P(ajlbj)-ti,j  =  l,...,  m),  (22b) 

noting  that  by  a  straightforward  convexity  argument,  both 
P*^3  and  P^  are  also  probability  vectors  in  S(q). 

23.  Asymptotic  and  complexity  reduction 
aspect  of  CRANOF 

The  actual  evaluation  of  the  seemingly  simple- 
spearing  integrals  mentioned  at  die  end  of  subsection  2.2 
can,  in  general,  be  very  computationally  intensive,  due  to 
die  possible  presence  of  a  large  number  (m)  of  premise  set 
constraints  P(aj|bj)  =  tj.  On  the  other  hand,  if  it  is 
reasonable  to  assume  that  all  of  the  thresholds  are 
sufficiently  close  to  unity,  relatively  simple  results  have 
been  obtained  by  Bamber  under  the  assumption  that  prior 
D  corresponds  to  a  uniform  distribution  over  Sq  [3]. 
Returning  to  the  non-limiting  threshold  case,  a  basic 
justification  has  been  derived  for  an  approximation  to 
meanconc  whereby  the  original  premise  class  is  replaced  by 
a  single  (albeit  complex)  rule  that  for  the  asyn^totic 
threshold  case  provides  equivalent  behavior  of  meanconc. 
This  requires  a  certain  sufficiency  condition  to  be  satisfied 
([4],  section  6).  (For  a  basic  formulation  and  application 
to  die  modified  transitivity  problem,  see  Section  3  below.) 


3.  CRANOF  applied  to  transitivity 
and  modified  transitivity 

3.1.  Basic  Results 

First,  it  should  be  remarked  that  a  related  transitivity¬ 
like  ^plication  of  CRANOF  to  track  correlation  is 
provided  in  [6]. 

Returning  to  the  more  specific  class  of  cyber-state 
problems  considered  here,  that  of  transitivity  in  cq.(6)  can 
be  fully  evaluated  by  specializing  the  previously  discussed 
surface  integral  techniques  [10],  assuming  D  corre^onds 
to  a  uniform  distribution  over  S{q).  Here,  q  =  7,  since  it  is 
readily  shown  here  that  the  conditionals  “if  c,  then  b”,  “if 
b,  then  a”,  and  “if  c,  then  a”  lead  to  =  (a&b&c, 

a&b&c',  a&b'&c,  a'&b&c,  a'&b&c',  a'&b'&c,  b'&c'}  and 
hence  card(^)  =  7: 

meanconc  ({if  c,  then  b,  if  b,  then  a;  if  c,  then  a)(s,t) 

=  ED(P(alc):  P  is  a  probability  measure  over  B  and 
P(alb)  -  s,  P(blc)  =  t) 

=  S  t  +  (U)/2  .h(s,t);  (23) 

where 

h(s,t)  =  hi(s,0  /  h2(s,t);  hi(s,t)  =  s  (2s  -l)  (l-s)  t-(l-t^); 
h2(s,t)  =  t  +  2t^+  hsCs.t);  h3(s,t)  =  s-(l-s)  (l-t)-(2+3t.t^).  (24) 

For  s,  t  reasonably  close  to  unity,  one  can  use  the 
simplifying  approximation 

h(s,t)-(2/3)-(l-s)(l-t).  (25) 

The  solution  to  the  above  problem  can  be  modified,  where 
probabilistic  inputs  are  replaced  by  linguistic  population- 
conditioned  ones.  (For  details,  see  [11].) 

While  closed-form  results  have  been  obtained  for  not 
only  transitivity  above  in  eqs.(23),  (24),  but  a  whole  host 
of  other  types  of  estimation  schemes  stemming  fixnn 
classical  logic  and  rule-based  system  considerations  (such 
as  contraposition,  positive  conjunction,  cautious 
monotonicity,  abduction,  strengthening  of  antecedent,  etc. 
-  again,  see  again  [4]),  many  other  more  complicated 
arguments  cannot  be  so  readily  obtained.  But,  as 
mentioned  earlier,  if  a  certain  sufficiency  condition  is 
satisfied,  then  the  argument  in  question  can,  in  an 
asymptotic  equivalent  threshold  sense,  be  replaced  by  a 
singleton  premise  class,  which,  in  turn,  leads  to  a  closed- 
form  evaluation. 

Theorem  1.  (Bamber  &  Goodman  [4].) 

Consider  the  problem  in  eq.(8),  addressed  by  use  of  the 
meanconc  function  in  eq.(llb).  Suppose  that  the  prior 
distribution  D  has  a  probability  density  function  over  S(q) 
that  is  continuous  and  uniformly  bounded  away  fi-om  both 
zero  and  infinity  and  that  the  condition 

aj&(bi*>ai)&...&(bm=>am)  ^  0,  for]  =  1,...,  m  (26) 

holds,  where  the  material  implication  operator  =>  is 
defined,  as  usual,  to  be  of  the  form 

bi=>ai  =  b/  V  ai  =  bi'  v  ai&bi .  (27) 
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Then,  letting  here  i  =  (t,  t,...,  t), 

(i) 


limit  [meanconc3({if  bi,  then  ai,.,.,  if  bm,  then  an,}; 
'  ■  *  if  f,  then  e)ft)] 

=  limit  [meanconc4({if  bi,  then  ai,,..,  if  bm,  then  an,}; 
'  ^  ^  if  f,  then  e)Ot)] 

=  limit  [meanconc2({if  P,  then  a};  if  f,  then  e)(t)] ; 

M  1 


(28) 


p  =  bi  V...V  bn, ;  a  =  p&(bi=>ai)&..,&(biD=>an,).  (29) 


(ii)  Analogous  relations  hold  for  the  limiting  forms  of 
meanconcj  for  j  =  1 ,  2  ■ 

Theorem  2.  (Bamber  &  Goodman  [4,  6]) 

Make  the  same  assumptions  as  in  Theorem  1 ,  now  with  D 
chosen  as  D(r),  where  D(r)  is  the  Dirichlet  probability 
distribution  with  parameter  vector  x  -  (xi,...,  x;,)  i  (1,  1 
1),  and  with  a  and  p  being  as  in  eq.(29).  Sx^pose, 
without  loss  of  generality,  that  the  class  of  relative  atoms 
A,  defined  in  (1 7),  has  been  indexed  so  that, 

A  =  {(Ol,  ...,  (l)q.l,  (Oq),  (30) 

(Oq  =  P'&r  =  b/&...&bn.'&f .  (31) 

For  any  nonvacuous  event  c  in  B,  the  boolean  algebra 
determined  by  Ay  let  1(c)  denote  the  unique  minimal  index 
set,  0  ^  1(c)  C  {1,...,  q-1,  q}  such  that 

c  =  Vj  in  i(c)(o)j)  (disjoint  disjunction).  (32) 


(34)-(36)  all  hold  here,  where  a  common  threshold  t  is 
determined  from  initially  given  s  and  tj  in  eq.(7)  (such  as 
via  a  weighted  average).  In  particular,  letting  Iid={1,,.., 
m), 

P  =  b  V  civ...vcn, ; 
a  =  (a  &  b  &  c)  V 

[V0jcQ*  ([&ji,K(l:^')]  &  [&iini»^K(bi)]  )]; 

a'&p  =  (a'&b)  V  (bi'&ci)  v...v  (bn,'&c„ ); 

P'  =  b'&ci'&..,&cn,'  ;  (37) 

where  here  e  =  a,  f  =  c,  so  that 
e  &  f  &  a 

=  (a  &  c)  &  [Vg^KClm  (  [&jmK  (bj')]  &  [&i  (bi)]  )], 

=  a&c&(ci=^bi)&...&(cn,=^bm); 
e&f&a'&p  =  a&b'&c  ;  e&f&P'  =  0; 
f&a  «  (a&b&c)  V 

[Vojca-  (c  &  [&jmK(b/)]  &  [&iiDiii..K(b,)]  )]; 

f&a'&p  =  a'&b&c  V  b'&c  ;  f&P'  =  0.  (38) 

In  turn,  using  the  definition  in  eq.(33)  in  a  straigjitforward 
way,  eqs.(37)  and  (38)  lead  to  the  corresponding  additive 
computations  for  x(a),  x(a'&P),  x(e&f&a),  x(e&f&a'&P), 
x(e&f&P'),  x(f&a),  x(f&a'&P),  x(f&P'),  etc.,  yielding  the 
full  evaluation  of  eqs.(35)  and  (36),  and  hence  eq.(34).  ■ 


Correspondingly,  define 

x(c)  =  sum(Xj:  j  in  1(c)).  (33) 

Then,  one  can  replace,  in  the  limiting  approximating  sense 
of  Theorem  1,  meanconC4({if  bi,  then  ai,...,  if  bm,  dien 
a,,};  iff,  then  e)(l)  by 
meanconc2({if  p,  then  a};  if  f,  then  e)(t) 

=  Q(t)/R(t);  (34) 

where 

Q(t)  =  [x(P)’x(e&f&a)/r(a)]t 

+[x(P)'x(e&f&a'&P)/x(a'&P)]  (1  -t) 

+  x(e&f&(P'-a)q));  (35) 

R(t)  =  [x(P)x(f&a)/x(a)]t 

+  [x(P)x(f&a'&P)/r(a'&P)](l-t) 

+  x(f&(P^-'0Dq))  (36) 

noting  the  relations  from  eqs.(32),  (33), 

x(e&f&(P'->(Oq))  =  x(e&f)  - x(e&f&P)  -  Xq,  ■ 

Corollary  1.  (New  application  of  Theorem  2  to  modified 
transitivity  problem  in  eq.(7)) 

Consider  the  modified  transitivity  problem  given  in 
eq.(7),  where  events  a,  bj,  b,  cj,  c  are  all  interpreted  as 
brfore  in  eq.(3).  (Here,  we  no  longer  require  any  use  of 
eq.(4).)  In  general,  there  are  q  =  2*(2“-l)  +  5  relative 
atoms  here  and  all  assumptions  of  Theorem  1  hold, 
including  eq.(24).  Thus,  making  again  the  assumptions  of 
Theorem  2,  applied  to  the  problem  here,  it  follows  fiiat  the 
approximating  results,  as  well  as  die  computations  in  eqs. 


4.  Choice  of  attributes  and  other 
issues  in  applying  CRANOF 

In  implementing  either  the  transitivity  or  the  less 
restrictive  modified  transitivity  approach,  one  must  choose 
the  most  appropriate  system  features  or  attributes  to  be 
able  to  compare  normal  or  attacked  cyber  states  relative  to 
these  categories.  One  possible  set  is  furnished  in  [15], 
where,  in  effect  the  relevant  attributes  under  consideration 
for  detection  of  intrusion  woe  associated  with  sendmail 
system  traces,  with  corresponding  domains  given  in 
possible  post-processing  percentages  of  abnormal 
sequences  of  system  traces.  These  include  various  types  of 
ssep  {5unse7idmailcp)y  sy slog-remote y  syslog-localy  and 
decode. 

Additional  issues  to  be  addressed  in  future  work 
include:  (i)  strategies  for  obtaining  the  empirical 
distributions  necessary  for  obtaining  initial  s  and  the  tj  in 
eq.(7),  as  discussed  in  Corollary  1;  (ii)  quantitative 
sensitivity  analysis  of  estimated  correlation  levels  to 
choices  of  sets  of  attributes  and  number  of  attributes;  and 
(iii)  complexity  problems  arising  from  the  number  of 
alternative  states  considered  for  pairwise  comparison  with 
current  state. 
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